3 research outputs found
FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication
The Discrete Fourier Transform (DFT) is essential for various applications
ranging from signal processing to convolution and polynomial multiplication.
The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time
complexity from the naive O(n^2) to O(n log n), and recent works have sought
further acceleration through parallel architectures such as GPUs.
Unfortunately, accelerators such as GPUs cannot exploit their full computing
capabilities as memory access becomes the bottleneck. Therefore, this paper
accelerates the FFT algorithm using digital Processing-in-Memory (PIM)
architectures that shift computation into the memory by exploiting physical
devices capable of storage and logic (e.g., memristors). We propose an O(log n)
in-memory FFT algorithm that can also be performed in parallel across multiple
arrays for high-throughput batched execution, supporting both fixed-point and
floating-point numbers. Through the convolution theorem, we extend this
algorithm to O(log n) polynomial multiplication - a fundamental task for
applications such as cryptography. We evaluate FourierPIM on a
publicly-available cycle-accurate simulator that verifies both correctness and
performance, and demonstrate 5-15x throughput and 4-13x energy improvement over
the NVIDIA cuFFT library on state-of-the-art GPUs for FFT and polynomial
multiplication
ClaPIM: Scalable Sequence CLAssification using Processing-In-Memory
DNA sequence classification is a fundamental task in computational biology
with vast implications for applications such as disease prevention and drug
design. Therefore, fast high-quality sequence classifiers are significantly
important. This paper introduces ClaPIM, a scalable DNA sequence classification
architecture based on the emerging concept of hybrid in-crossbar and
near-crossbar memristive processing-in-memory (PIM). We enable efficient and
high-quality classification by uniting the filter and search stages within a
single algorithm. Specifically, we propose a custom filtering technique that
drastically narrows the search space and a search approach that facilitates
approximate string matching through a distance function. ClaPIM is the first
PIM architecture for scalable approximate string matching that benefits from
the high density of memristive crossbar arrays and the massive computational
parallelism of PIM. Compared with Kraken2, a state-of-the-art software
classifier, ClaPIM provides significantly higher classification quality (up to
20x improvement in F1 score) and also demonstrates a 1.8x throughput
improvement. Compared with EDAM, a recently-proposed SRAM-based accelerator
that is restricted to small datasets, we observe both a 30.4x improvement in
normalized throughput per area and a 7% increase in classification precision
The Bitlet model: a parameterized analytical model to compare PIM and CPU systems
Currently, data-intensive applications are gaining popularity. Together with this trend, processing-in-memory (PIM)-based systems are being given more attention and have become more relevant. This article describes an analytical modeling tool called Bitlet that can be used in a parameterized fashion to estimate the performance and power/energy of a PIM-based system and, thereby, assess the affinity of workloads for PIM as opposed to traditional computing. The tool uncovers interesting trade-offs between, mainly, the PIM computation complexity (cycles required to perform a computation through PIM), the amount of memory used for PIM, the system memory bandwidth, and the data transfer size. Despite its simplicity, the model reveals new insights when applied to real-life examples. The model is demonstrated for several synthetic examples and then applied to explore the influence of different parameters on two systems - IMAGING and FloatPIM. Based on the demonstrations, insights about PIM and its combination with a CPU are provided.This work was supported by the European Research Council through the European Union’s Horizon 2020 Research and Innovation Programme under Grant No. 757259 and by the Israel Science Foundation under Grant No. 1514/17